98 research outputs found

    From array-based hybridization of Helicobacter pylori isolates to the complete genome sequence of an isolate associated with MALT lymphoma

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>elicobacter pylori </it>infection is associated with several gastro-duodenal inflammatory diseases of various levels of severity. To determine whether certain combinations of genetic markers can be used to predict the clinical source of the infection, we analyzed well documented and geographically homogenous clinical isolates using a comparative genomics approach.</p> <p>Results</p> <p>A set of 254 <it>H. pylori </it>genes was used to perform array-based comparative genomic hybridization among 120 French <it>H. pylori </it>strains associated with chronic gastritis (n = 33), duodenal ulcers (n = 27), intestinal metaplasia (n = 17) or gastric extra-nodal marginal zone B-cell MALT lymphoma (n = 43). Hierarchical cluster analyses of the DNA hybridization values allowed us to identify a homogeneous subpopulation of strains that clustered exclusively with <it>cag</it>PAI minus MALT lymphoma isolates. The genome sequence of B38, a representative of this MALT lymphoma strain-cluster, was completed, fully annotated, and compared with the six previously released <it>H. pylori </it>genomes (i.e. J99, 26695, HPAG1, P12, G27 and Shi470). B38 has the smallest <it>H. pylori </it>genome described thus far (1,576,758 base pairs containing 1,528 CDSs); it contains the <it>vacA</it>s2m2 allele and lacks the genes encoding the major virulence factors (absence of <it>cag</it>PAI, <it>bab</it>B, <it>bab</it>C, <it>sab</it>B, and <it>hom</it>B). Comparative genomics led to the identification of very few sequences that are unique to the B38 strain (9 intact CDSs and 7 pseudogenes). Pair-wise genomic synteny comparisons between B38 and the 6 <it>H. pylori </it>sequenced genomes revealed an almost complete co-linearity, never seen before between the genomes of strain Shi470 (a Peruvian isolate) and B38.</p> <p>Conclusion</p> <p>These isolates are deprived of the main <it>H. pylori </it>virulence factors characterized previously, but are nonetheless associated with gastric neoplasia.</p

    Gene dispersion is the key determinant of the read count bias in differential expression analysis of RNA-seq data

    Get PDF
    Background: In differential expression analysis of RNA-sequencing (RNA-seq) read count data for two sample groups, it is known that highly expressed genes (or longer genes) are more likely to be differentially expressed which is called read count bias (or gene length bias). This bias had great effect on the downstream Gene Ontology over-representation analysis. However, such a bias has not been systematically analyzed for different replicate types of RNA-seq data. Results: We show that the dispersion coefficient of a gene in the negative binomial modeling of read counts is the critical determinant of the read count bias (and gene length bias) by mathematical inference and tests for a number of simulated and real RNA-seq datasets. We demonstrate that the read count bias is mostly confined to data with small gene dispersions (e.g., technical replicates and some of genetically identical replicates such as cell lines or inbred animals), and many biological replicate data from unrelated samples do not suffer from such a bias except for genes with some small counts. It is also shown that the sample-permuting GSEA method yields a considerable number of false positives caused by the read count bias, while the preranked method does not. Conclusion: We showed the small gene variance (similarly, dispersion) is the main cause of read count bias (and gene length bias) for the first time and analyzed the read count bias for different replicate types of RNA-seq data and its effect on gene-set enrichment analysis

    Improving gene-set enrichment analysis of RNA-Seq data with small replicates

    Get PDF
    Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterG- SEA) coded with C++ (Rcpp) is available from CRAN.open

    Life in an arsenic-containing gold mine: genome and physiology of the autotrophic arsenite-oxidizing bacterium rhizobium sp. NT-26

    Get PDF
    Arsenic is widespread in the environment and its presence is a result of natural or anthropogenic activities. Microbes have developed different mechanisms to deal with toxic compounds such as arsenic and this is to resist or metabolize the compound. Here, we present the first reference set of genomic, transcriptomic and proteomic data of an Alphaproteobacterium isolated from an arsenic-containing goldmine: Rhizobium sp. NT-26. Although phylogenetically related to the plant-associated bacteria, this organism has lost the major colonizing capabilities needed for symbiosis with legumes. In contrast, the genome of Rhizobium sp. NT-26 comprises a megaplasmid containing the various genes, which enable it to metabolize arsenite. Remarkably, although the genes required for arsenite oxidation and flagellar motility/biofilm formation are carried by the megaplasmid and the chromosome, respectively, a coordinate regulation of these two mechanisms was observed. Taken together, these processes illustrate the impact environmental pressure can have on the evolution of bacterial genomes, improving the fitness of bacterial strains by the acquisition of novel functions

    Gene expression clines reveal local adaptation and associated trade-offs at a continental scale

    Get PDF
    Local adaptation, where fitness in one environment comes at a cost in another, should lead to spatial variation in trade-offs between life history traits and may be critical for population persistence. Recent studies have sought genomic signals of local adaptation, but often have been limited to laboratory populations representing two environmentally different locations of a species' distribution. We measured gene expression, as a proxy for fitness, in males of Drosophila subobscura, occupying a 20° latitudinal and 11 °C thermal range. Uniquely, we sampled six populations and studied both common garden and semi-natural responses to identify signals of local adaptation. We found contrasting patterns of investment: transcripts with expression positively correlated to latitude were enriched for metabolic processes, expressed across all tissues whereas negatively correlated transcripts were enriched for reproductive processes, expressed primarily in testes. When using only the end populations, to compare our results to previous studies, we found that locally adaptive patterns were obscured. While phenotypic trade-offs between metabolic and reproductive functions across widespread species are well-known, our results identify underlying genetic and tissue responses at a continental scale that may be responsible for this. This may contribute to understanding population persistence under environmental change
    corecore